13 research outputs found

    FaceAtt: Enhancing Image Captioning with Facial Attributes for Portrait Images

    Full text link
    Automated image caption generation is a critical area of research that enhances accessibility and understanding of visual content for diverse audiences. In this study, we propose the FaceAtt model, a novel approach to attribute-focused image captioning that emphasizes the accurate depiction of facial attributes within images. FaceAtt automatically detects and describes a wide range of attributes, including emotions, expressions, pointed noses, fair skin tones, hair textures, attractiveness, and approximate age ranges. Leveraging deep learning techniques, we explore the impact of different image feature extraction methods on caption quality and evaluate our model's performance using metrics such as BLEU and METEOR. Our FaceAtt model leverages annotated attributes of portraits as supplementary prior knowledge for our portrait images before captioning. This innovative addition yields a subtle yet discernible enhancement in the resulting scores, exemplifying the potency of incorporating additional attribute vectors during training. Furthermore, our research contributes to the broader discourse on ethical considerations in automated captioning. This study sets the stage for future research in refining attribute-focused captioning techniques, with a focus on enhancing linguistic coherence, addressing biases, and accommodating diverse user needs

    BdSpell: A YOLO-based Real-time Finger Spelling System for Bangla Sign Language

    Full text link
    In the domain of Bangla Sign Language (BdSL) interpretation, prior approaches often imposed a burden on users, requiring them to spell words without hidden characters, which were subsequently corrected using Bangla grammar rules due to the missing classes in BdSL36 dataset. However, this method posed a challenge in accurately guessing the incorrect spelling of words. To address this limitation, we propose a novel real-time finger spelling system based on the YOLOv5 architecture. Our system employs specified rules and numerical classes as triggers to efficiently generate hidden and compound characters, eliminating the necessity for additional classes and significantly enhancing user convenience. Notably, our approach achieves character spelling in an impressive 1.32 seconds with a remarkable accuracy rate of 98\%. Furthermore, our YOLOv5 model, trained on 9147 images, demonstrates an exceptional mean Average Precision (mAP) of 96.4\%. These advancements represent a substantial progression in augmenting BdSL interpretation, promising increased inclusivity and accessibility for the linguistic minority. This innovative framework, characterized by compatibility with existing YOLO versions, stands as a transformative milestone in enhancing communication modalities and linguistic equity within the Bangla Sign Language community

    Flow field analysis of a pentagonal-shaped bridge deck by unsteady RANS

    No full text
    Long-span cable-stayed bridges are susceptible to dynamic wind effects due to their inherent flexibility. The fluid flow around the bridge deck should be well understood for the efficient design of an aerodynamically stable long-span bridge system. In this work, the aerodynamic features of a pentagonal-shaped bridge deck are explored numerically. The analytical results are compared with past experimental work to assess the capability of two-dimensional unsteady RANS simulation for predicting the aerodynamic features of this type of deck. The influence of the bottom plate slope on aerodynamic response and flow features was investigated. By varying the Reynolds number (2 × 104 to 20 × 104) the aerodynamic behavior at high wind speeds is clarified

    Multi-class sentiment classification on Bengali social media comments using machine learning

    No full text
    Multi-class Sentiment Analysis (SA) is an important field of computational linguistics that extracts multiple opinions expressed in a text using NLP and text-mining techniques. Existing research on multi-class SA in the Bengali language is directed towards ternary classification with unsatisfactory classification performance. Moreover, obtaining a higher performance score is challenging due to the peculiarities of Bengali text, lack of ground truth datasets, and low resources of preprocessing tools. Moreover, no research has shown that deep learning algorithms perform higher on four types of sentiments. Therefore, we proposed a supervised deep learning classifier based on CNN and LSTM to conduct multi-class SA on Bengali social media comments labelled as sexual, religious, political, and acceptable. The study aims to achieve maximum accuracy using the proposed model and provide a comparative analysis with the baseline models. Six machine learning models with two different feature extraction techniques were considered baseline models. The performance of our proposed CLSTM architecture can greatly improve the performance of SA with 85.8% accuracy and 0.86 F1 scores on a labelled dataset of 42,036 Facebook comments. A web application based on the proposed model and the highest-performing baseline model was built to detect the real-life sentiment of social media comments

    A Comparative Analysis on Suicidal Ideation Detection Using NLP, Machine, and Deep Learning

    No full text
    Social networks are essential resources to obtain information about people’s opinions and feelings towards various issues as they share their views with their friends and family. Suicidal ideation detection via online social network analysis has emerged as an essential research topic with significant difficulties in the fields of NLP and psychology in recent years. With the proper exploitation of the information in social media, the complicated early symptoms of suicidal ideations can be discovered and hence, it can save many lives. This study offers a comparative analysis of multiple machine learning and deep learning models to identify suicidal thoughts from the social media platform Twitter. The principal purpose of our research is to achieve better model performance than prior research works to recognize early indications with high accuracy and avoid suicide attempts. We applied text pre-processing and feature extraction approaches such as CountVectorizer and word embedding, and trained several machine learning and deep learning models for such a goal. Experiments were conducted on a dataset of 49,178 instances retrieved from live tweets by 18 suicidal and non-suicidal keywords using Python Tweepy API. Our experimental findings reveal that the RF model can achieve the highest classification score among machine learning algorithms, with an accuracy of 93% and an F1 score of 0.92. However, training the deep learning classifiers with word embedding increases the performance of ML models, where the BiLSTM model reaches an accuracy of 93.6% and a 0.93 F1 score

    Theoretical Investigation on the Impact of Two HDR Dampers on First Modal Damping Ratio of Stay Cable

    No full text
    Stay cables are one of the vital components of a cable-stayed bridge. Due to their flexible nature, stay cables are vulnerable to external excitation and often vibrate with large amplitude under wind action which leads to the fatigue failure of the cables. To suppress such kind of large amplitude vibration by improving the damping ratio of the cable various dampers such as magnetorheological damper, friction damper; oil damper; or high damping rubber (HDR) damper are utilized and gained popularity over time. This paper focuses on improving the damping ratio of stay cables using a combination of two HDR dampers. First, the theoretical model is formulated considering cable bending stiffness to evaluate the damping effect of cable-HDR dampers system. Then, the impact of various design parameters of HDR dampers on cable damping considering the cable stiffness is performed. The comparative analysis of results shows that the considered parameters such as loss factor, spring factor, and installation location of dampers have much effect on the stay cables damping ratio. Finally, the optimal parameters of the two HDR dampers are proposed for damper design

    Accessible Data Representation with Natural Sound

    No full text
    Sonification translates data into non-speech audio. Such auditory representations can make data visualization accessible to people who are blind or have low vision (BLV). This paper presents a sonification method for translating common data visualization into a blend of natural sounds. We hypothesize that people’s familiarity with sounds drawn from nature, such as birds singing in a forest, and their ability to listen to these sounds in parallel, will enable BLV users to perceive multiple data points being sonified at the same time. Informed by an extensive literature review and a preliminary study with 5 BLV participants, we designed an accessible data representation tool, Susurrus, that combines our sonification method with other accessibility features, such as keyboard interaction and text-to-speech feedback. Finally, we conducted a user study with 12 BLV participants and report the potential and application of natural sounds for sonification compared to existing sonification tools.https://doi.org/10.1145/3544548.358108
    corecore